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Abstract 

In this paper we revisit the Bialynicki-Birula & Mycielski uncertainty principle [1] 
and its cases of equality. This Shannon entropic version of the well-known Heisen- 
berg uncertainty principle can be used when dealing with variables that admit no 
variance. In this paper, we extend this uncertainty principle to Renyi entropies. We 
recall that in both Shannon and Renyi cases, and for a given dimension n, the only 
case of equality occurs for Gaussian random vectors. We show that as n grows, 
however, the bound is also asymptotically attained in the cases of n-dimensional 
Student-t and Student-r distributions. A complete analytical study is performed 
in a special case of a Student-t distribution. We also show numerically that this 
effect exists for the particular case of a n-dimensional Cauchy variable, whatever 
the Renyi entropy considered, extending the results of Abe [2] and illustrating the 
analytical asymptotic study of the student-t case. In the Student-r case, we show 
numerically that the same behavior occurs for uniformly distributed vectors. These 
particular cases and other ones investigated in this paper are interesting since they 
show that this asymptotic behavior cannot be considered as a "Gaussianization" of 
the vector when the dimension increases. 
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1 Introduction 



Let us consider an n- dimensional wave packet ^ n (x) and denote by $ n ( u ) — 
(27r)~ 2 J Rn \|> n (x)e _m x dx its Fourier transform (consider for example the po- 
sition of a particle and its momentum). In the following, we will denote 
by X n a zero- mean random vector with probability density function (pdf) 
fn(x) = \ty n (x)\ 2 and by X n a random vector with pdf f n (x) = \^(x)\ 2 (by 
Parseval's relation, this is a pdf). Vectors X n and X n are called conjugated. 
The well-known Heisenberg uncertainty principle (H.U.P.) relates the "infor- 
mation" available in two conjugated random vectors, stating that the product 
of their variances is larger than a given bound, namely 



(E [X* n X n ] E 



x n x n 



n ~ 2 

(see also [3] for a matrix- variate extension of this result). The H.U.P. is impor- 
tant in physics since it expresses the impossibility of an arbitrarily accurate 
preparation of both position and momentum of a particle. This inequality 
finds also application in areas of signal processing such as time-frequency 
analysis [4,5] where it is known as the Heisenberg- Gabor inequality. However, 
the H.U.P. has a meaning only if the quantities in balance exist. 

To bypass this restriction, Bialynicki-Birula & Mycielski showed in 1975 [1] 
that the H.U.P. can be extended to information theoretic measures: more 
precisely, they showed that the sum of the Shannon entropy rates of X n and 
of X n verifies the Bialynicki-Birula & Mycielski inequality (B. B.M.I.) 

HVQ±BM >1+logn (2) 

n 



where the Shannon entropy is H(X n ) = — J f n log/ n (and likewise for X n ). 
The B. B.M.I, can be expressed equivalently via the entropy power N(X n ) = 
2^ ex P (l H ( X n)) (as given in [3]) by 

(N(X n )N(X n )) h > \. (3) 



The B. B.M.I. (2) is stronger than the H.U.P. (1): as shown in [1], it can be 
applied also to variables with infinite variance, provided that their Shannon 

entropy exists: it is the case for a Cauchy pdf f n (x) = „+i (1 +x t xy B ~i~ , for 

7T 2 

example. Moreover, it is shown in [1] that B. B.M.I. (2) implies H.U.P. (1) when 
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dealing with variables that admit a variance 1 . As inequality (1), the B. B.M.I, 
finds applications in signal processing (see [4,5] and references therein). In 
physics, maximization without constraint of the sum of the entropies that ap- 
pear in the B. B.M.I, has been suggested [6] as an interesting counterpart to 
the classical "maximum entropy under constraint" approach for the derivation 
of the wave functions associated with atomic systems. As for the Heisenberg 
inequality, the lower bound in (2) is attained only for Gaussian wave packets. 
But in their paper [2] , Abe et al. showed that this bound is also asymptotically 
reached by n— dimensional Cauchy vectors when n increases. In this paper, we 
will extend this observation by exhibiting other families of distributions that 
show the same asymptotical behavior. Moreover, we will focus on the possi- 
ble interpretations of this behavior, namely a "Gaussianization effect": one 
natural interpretation is that these distributions get closer to the Gaussian 
distribution - in some sense to be determined - as the dimension increases; 
this interpretation will be proved erroneous by providing some cases in which 
a distribution reaches asymptotically the bound of the B. B.M.I, but keeps at 
non-zero (even infinite) distance of any Gaussian distribution. As we will ex- 
plain in the conclusion, the effect observed is mainly due to the normalization 
1/n, which may be too strong in the considered non-iid context. 

This paper is organized as follows: 

• In the first part, we come back to the Bialynicki-Birula & Mycielski uncer- 
tainty relation, we reformulate it and we generalize it to the Renyi entropies. 
We then give the expression of the entropies rates under consideration when 
dealing with elliptical random vectors. 

• We then address the study of the asymptotic cases of equality in the gen- 
eralized B.B.M.I., particularizing to the family of n-dimensional Student-t 
variables with m degrees of freedom. For this class of variables, we provide 
an upperbound of the sum of the entropy rates in balance which permits 
to evaluate the asymptotic behavior of this quantity. We then perform the 
complete analytical study in the particular case m = n + 2, confirming that 
the lower bound of the uncertainty relation is attained asymptotically as 
n increases. In a second illustration, we revisit the Cauchy case (m — 1) 
as studied by Abe in the context of the Renyi formulation of the uncer- 
tainty relation, and we show that the effect observed by Abe remains for 
any "admissible" Renyi entropy. 

Secondly, we explore what happens in the n-dimensional Student-r class 
of variables with m degrees of freedom. We present the case m = n, corre- 
sponding to the uniform distribution in the n-dimensional sphere, as well 



Since under covariance constraint N is maximum in the Gaussian context, 
N(X n ) < N(G n ) = no 2 where G n is Gaussian with the same covariance than 
X n , and similarly for X n : the product of the variances is higher than the product 
of the entropy power, implying the H.U.P. 
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as some other cases. 
• Finally, we provide some clarifications about this asymptotic behavior, ex- 
plaining why a Gaussianization effect, measured in the distribution or in 
the information divergence sense, is not necessary to reach asymptotically 
equality in the B. B.M.I. 



2 The Renyi entropy uncertainty relation 



The proof of the B. B.M.I. (2) is based on the Beckner inequality relating the 
norms of any (wave) function \l/ n of L p (W n ) and of its Fourier transform 
i.e. 

||^n||g — (Cp,<jr) ||\]/ n ||p (4) 



where p and q are conjugated, i.e. ^ + ^ = 1, where p e]l; 2] and where C PA 

is the Babenko constant expressed as C VA = (jfj 2p (y) 29 [7,8]. In [1], the 

authors consider the positive function W(q) = (C P)q ) n \\^f n \\p ~ ll^nllg defined 
for q > 2. Since W(2) = by Parseval's identity and since W(q) is positive, 
the derivative of W(q) in q = 2 is positive as well, which leads to inequality 
(2). 

The B. B.M.I. (2) can be extended to other measures of information such as 
the Renyi entropies that include the Shannon entropy as a special case. The 
Renyi entropy with parameter A is defined as 

Hx{X n ) = ^ log (/ f^j (5) 



for A 7^ 1 [9]. When A tends to 1, by l'Hospital's rule, H\ converges to the 
Shannon entropy that will thus be denoted by continuity H 1 = H. The Renyi 
entropy is widely used, not only in physics (e.g. statistical mechanics, physics 
of turbulence, cosmology, see [10,11,12] and references therein), but in various 
other areas such as in signal processing (time scale analysis, decision problems, 
machine learning, see [4,13,14,15] and references therein), or image processing 
(image matching, image registration see [16,17] and references therein). 

Theorem 1 With ^ + ^ = 1 and for any p > 1, the B. B.M.I, writes in terms 
of Renyi entropy as 

H,(X n ) + m(X n ) logp log, 

n - BKJ p-2q-2 K ' 
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Proof. It is straightforward that 



log || *„ ||p = log /, 



1/2 



-log 

V 

2-p 
2p 



fp/2 
J n 

HM n ). 



Hence, taking the logarithm of both sides of (4), using the conjugation relation 
1 and 1 < p < 2 leads to 



p q 



P HAX n ) - 2 —A H% {X n )>n{^-^- \ log(27r) 



2p 5V ' 2q ^ y 2 p 2q, 

n n 
+ — logp- — logg. 
2p 2q 

But p and q are conjugated so that = ^f- Hence, since 2 — p > 0, 
multiplying each side by - ^-p) one fi nan y obtains uncertainty relation (6). 

Since the pdfs of X n and of X n coincide, X n and X n have a symmetrical role 
in (6) and can then be exchanged: as a consequence inequality (6) holds for 
any p > 1, provided that the entropies in balance exist. ■ 

By taking the limit p — > 2 in (6), the B. B.M.I. (2) is recovered, proving that 
it is a particular case of (6). We notice that a similar generalization exists for 
the Tsallis entropy [18,19,20], which is widely used in statistical physics and 
related to the Renyi entropy by an invertible transformation [10,11,12,21,22]. 
In this paper we will focus on the Renyi entropy since the resulting form of the 
uncertainty relation is very similar to the B. B.M.I. Furthermore, the quantity 
-H\(X n ), also known as the entropy rate (i.e. entropy per sample), is very 
often considered in the information theoretic context [13]. 

Theorem 2 For a given n, case of equality in (2) or (6) is reached if and 
only if X n and X n are Gaussian random vectors. 

Proof. It is straightforward to check that Gaussian vectors X n and X n reach 
equality in either inequality (2) or (6) by plugging the Gaussian pdfs. Con- 
versely, for p 7^ 2, since only Gaussian waves functions achieve equality in (4) 
as it is proved in [23], the Gaussian waves are the only wave packets which 
achieve equality in (6). The Shannon case p = 2 is more subtle, but it has been 
recently proved that equality is reached in inequality (2) only in the Gaussian 
case [24]. ■ 

However, in [2], Abe showed in the case of Shannon entropies that n— variate 
Cauchy vectors reach equality in (2) asymptotically with n. We will show in 
the next part that this result extends in fact to inequality (6) for any value 
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p/2 of the entropy index for which the entropy exists. Furthermore we will 
show that the Cauchy case is not the only one exhibiting this behavior. 



3 Asymptotic cases of equality 

As previously stated, although equality is achieved in (6) only by Gaussian 
wave packets, n— dimensional Cauchy wave packets reach equality asymptoti- 
cally with the dimension n. An important question is to understand what are 
the ingredients that lead to this asymptotic behavior. 

We first remark that in the Cauchy case presented by Abe and in the cases 
studied below, the components of the random vector are dependent: in other 
words the wave function \l/ n is not separable. This condition is clearly re- 
quired since, dealing with independent identically distributed (i.i.d.) compo- 
nents (separable wave function \I/ n ), the entropy rate coincides with the en- 
tropy of a single component: H\(X n ) = nH\(X) where X is any component 
of X n . But if X n is i.i.d., the conjugated vector X n is i.i.d. as well since the 
Fourier transform of a separable function is separable. Hence with obvious 
notations H x (X n ) = nH x (X) and gfe)±jW0 = H X {X) + H X (X): as a 
consequence, no asymptotic effect can appear in the i.i.d. setup. 

Furthermore, for any invertible matrix M, the Renyi A-entropy of Y n — MX n 
is expressed as H x (Y n ) = log \ M\ + H x (X n ) and since Y n = M~ l X n , the sum 
of the entropy rates is (matrix) scale invariant: it is thus impossible to reach 
equality by introducing such simple correlation between the components of 
the random vectors. 

Except for these basic requirements, the answer remains open as far as we 
know. The study of the following cases, showing very different behaviors, at- 
tempts to give some elements of answer. 



3.1 Derivation of the entropy rates in the elliptic case 

We concentrate in the following on the case of elliptical random vectors. A vec- 
tor X n is called elliptical if its pdf f n is a single-valued function of a quadratic 
form [25] 

/„(x) = |E; 1 |-'d»((^S- 1 x) i ) (7) 
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for some function d n and where S n is a positive definite symmetric matrix. S n 
is called characteristic matrix. In other words, the random vector S^ 1 / 2 X n is 
isotropic. Due to the matrix scale invariance of the studied entropic inequalities 
evoked above, we will consider without loss of generality in the following that 
S n is proportional to the identity matrix I n . Otherwise, except in (6) where 
there is no influence, X n and X n must be understood as Y^I 2 X n and Y}^ 2 X n 
respectively. 

Theorem 3 If X n is elliptical as in (7) ( with S n = I n ), then the sum of the 
entropy rates of X n and X n is 

Hp(X n ) + Hq(X n ) 

U p {Xn) = "J 

2 , ( 27T™/ 2 \ 

log 



n(2 - q) () 
for p 7^ 2 and 

H(X n ) + H(X n 



n *\T(n/2)J 

+oo 

2 f (n-l)(2-p) p 

+ W^¥) g l r 2 " (r) 

+oo 

[ (n-l)(2- g ) 2 

log / r 2 -fc n (rj 2 dr 



U 2 (X n ) 



n 



n \T(n/2)) n J 

+oo 

— / (D n (r) log(D n (r)) + E n {r) \og{E n {r))) dr (9) 
n J 
o 

/or p = 2, where 



n 

D n {r) = -^sr n -'d n {r) 



= y {pr)^D n ( P )^J^ 1 (pr)dp 



2 







(10) 



are the pdfs of the Euclidean norms \\X n \\ and \\X n \\ respectively. 
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Proof. Using spherical coordinates, a simple computation shows that the pdf 



D n (r) of the Euclidean norm \\X n \\ of X n is 



D n (r) = 



27rf 

r(f)'' 



„n-l 



d n (r) 



if r > and is zero otherwise (see also [26, eq. (7)]). 

Let us show now that, X n being elliptical, X n is elliptical: the pdf f n of the 
conjugate variable is given by 



f n (u)= (27r)-t J f>(x)e-^ x dx 

(27r)-t y 4 ((^aO^e-'^da; 
Applying [26, eq. (5)], we obtain 



fn(u) 



+ 0O 



pi(u u) 4 Jn_ 1 (p( u u) 2 )dn(p)dp 



where J v is the Bessel function of the first kind and order v. This result proves 
that X n is elliptical. From the above expression of /„ and the forms (7)- (11), 
the pdf E n of \\X n \\ writes then 



+ 0O 



EJr) 



(pr)^D? l (p)JiL_ l (pr) dp 



'12) 



for r > and zero otherwise. We remark that E n is the square of the (| — 1)- 

i 

order Hankel transform of D£ (see also [27,28]). 



Now the A-Renyi entropy of X n is 



H x (X n ) = T ^ J \og J ti(x)dx 



1 - A 



log 



n +00 
2712 



r - , 

• 1 2 J 



r n - l d x n (r)dr 
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the last equality being an application of [29, 4.642]. This yields the following 
expression for the A-Renyi entropy of X n , 



27r"/ 2 1 + r 

Hx{X n ) = log —- + — log J r^-^D^r) dr (13) 
The Shannon entropy of X n can be deduced from (13) using L'Hospital's rule: 

27r n/2 +«> +°° 

#(X n ) =log— — + (n-l) y L> n (r)logrdr- y D n (r) logD n (r) dr (14) 
^ 2 ' o o 

We have shown that X n is again elliptical so that its A-Renyi entropy is again 
expressed as (13) with D n , the pdf of ||X n ||, replaced by E n , the pdf of ||X n ||, 
leading to results (8) and (9). ■ 

We have now all the material to investigate more deeply some special cases of 
asymptotic equalities in (6). 



3.2 The general Student-t case 



We consider in this subsection the case where X n is distributed according to 
a Student-t law with m degrees of freedom, i.e. 2 

fn(x) = ^-yr(l+X t x) 2 (15) 



where m > 0. When m = 1, X n is the well-known multivariate Cauchy vector, 
as studied by Abe in [2]. The Student-t variables play an important role in 
statistics because of their power-law behavior and their simple analytic ex- 
pression. Moreover, they maximize the Renyi/Tsallis entropy with parameter 
A = 1 — under covariance constraint [12,21,22,31,32]. 

Theorem 4 If X n is Student-t distributed, then the pdf of \\X n \\ is 

2T ( 2±2^ r n-l 



2 More rigorously the random variable yJmX n is Student-t [30], but we have seen 
that the sum of the entropy rates is insensitive to any scaling factor. 
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while the pdf of \\X n \\ is 



n+m \ 

E n (r) = r-, 7 s v V , r^' 1 Kl- m (r) (17) 



r(f)r(f) r 2 (^) ^ 



where K v is the modified Bessel function of the third kind with order v (see 
for example [29, 8.432]). 

Moreover, if p > the sum of the entropy rates is 

Up (x n ) — iog7r + (4+g t?nr s2 + ^fc) ((p - x ) io s r (t ) 

+ io g r ( ("+-)^- 2 " ) - io g r +p\o g v( 



n+m 



(18) 



+oo 

' log J r^ +n ~ l Kl^(r)dr 
o 



"(2-9) 



while, for p = 2, 

u 2 (x n ) = log tt + log 2 + f (log r (f ) - log r (s±s) + log r (=±2)) 
+ t (f ) + 2 V (^) ) + * V> (^) - ^ V> 



(19) 



n+m , 
- ' n+m 



~"T" | ^m^p2^ w+^T J J T 2 K n-m (r) log K n-m (r) OT 



Proof. (16) is straightforward from (15), (7) and (11) while (17) is a conse- 
quence of (12) and [28, 9-28 (22)] (or [27, 8.5 (20)] or [29, 6.565-4]). Finally, 
plugging (16) and (17) into (8) and using [29, 8.380-3] and q = the sum 
of the entropy rates (18) for p ^ 2 follows. In the Shannon case, starting from 

+?° r n-l 

(9), recognizing / dr as a beta integral and using [29, 6.576-4] 

J (1+r 2 ) — 

+oo 

to evaluate J r~~ x Kl{r) dr, we notice that h{r) x \og(h(r)) = j^h(r) x (with 
o 

h(r) = r and h(r) = 1 + r 2 ) to finally obtain (19). 

Notice that both from [29, 8.380-3] to insure the existence of terms in (18) 
or from the asymptotics [30, 9.6.9 and 9.7.2] to insure the existence of the 
remaining integral, quantity U p (X n ) exists provided that 

p > (20) 
n + m 
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Unfortunately, in the general case, neither (18) nor (19) can be further an- 
alytically developed: numerical integration is necessary for the evaluation of 
the remaining integral. We will see in section 3.2.1 that in the special case 
m = n + 2, the remaining integral can be fully developed and hence the inves- 
tigation can be completely performed. However, an asymptotic result can be 
obtained for any positive value of m, as expressed by the following theorem. 

Theorem 5 For any positive value of the degree of freedom, the following 
equivalent holds: 



U p (X n ) = log(27r) + + + o(l) for p + 2 (21) 



f/ 2 (X n ) = l + log7r + o(l) (22) 



Hence, the lower bound of (6) is asymptotically attained when n — > oo. 
Proof. Let us denote 

/(A, q) = log ^ J r- 1 p + k^(r)) A drj (23) 



Schwarz inequality implies that |^ > 0, showing that function /(A, q) is 
convex against A. As a consequence, for any A G [1;2] we have I(X,q) = 
J((2 - A) x 1 + (A - 1) x 2, q) < (2 - A)7(l, q) + (A - 1)7(2, q). If A > 2 this 
inequality is reversed. Since the remaining integral is I(q,q), for q ^ 2 we 
obtain the inequality 



'Al_i'-)<H < J(l, ?) + ^(2, ?) (24) 




4 



2-g 
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Using [29, 6.561-16 and 6.576-4] to evaluate 1(1, q) and 1(2, q) respectively, 
and the fact that p and q are conjugated, we finally obtain, for p ^ 2, 

U p (X n ) <M(n,m,p) 

M(n, m, p) = log(27r) + ^ ((p - 1) log r (f ) + log r ( ("+"f~ 2 " ) 

-iogr((^)+ P io g r(^) 

(25) 

+ (2 - p) log T ( n(p -g +mP ) - 2 log r ( 2n(p-2)+(n+m)p ^ 
- logT ( "(p~2)+™p ) + logr ( 2n(p-2)+(n+m)p ^ 

+iiogr(^) + ^iogr( f ) 



Now using the asymptotics of the log-gamma function [30, 6.1.41] and tedious 
algebra, one obtains that the upperbound M verifies 

M(n, m,p) = log(27r) + ^ + ^ + o(l) (26) 



whatever m > 0, dependent or not on n. In the case p — 2, one has by 
continuity (both for U p and for M(n,m,p)) 

U 2 (X n ) < M(n,m,2) (27) 



with 



n+rri 
4 



M(n, m, 2) = log(27r) + f (log r (f ) - log r (f ) + log T (f ) - log r ( 

" 5^ (?) + ^ (^) + V> 

Using again the asymptotics of the log-gamma and psi functions, one obtains 
M(n,m,2) = 1 + log7r + o(l) (28) 



Together with (6) these results confirm (21): the lower bound of (6) is attained 
when n — > oo in the Student-t case. ■ 
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3.2.1 The Student-t case with m = n + 2 degrees of freedom: a completely 
analytical study 

Assuming m = n + 2, the Bessel function in (17) is of order — \ and from [29, 
8.469-3] we have, 

K^(r)=K l2 (r) = ] f^e- r (29) 



Notice first that from the analogy with (7)-(ll) and using the Euler's dupli- 
cation formula [29, 8.335-1], X n is distributed according to 



2«— i y ( —) 
fn(x) = n- ^ exp (-2 {x'xfA . 



, . > -v (30) 



This distribution is called the multivariate generalization of the Laplace dis- 
tribution 3 [36] or multivariate extension generalized Gaussian [37] or mul- 
tivariate exponential power [35]. As shown in [38], each particle of an ideal 
relativistic photon gas in a container with rigid and diathermic walls has a 
3-dimensional momentum X that follows distribution (30). 

Theorem 6 The entropy sum in the case of the Student-t distribution with 
m = n + 2 degrees of freedom is, for p ^ 2, 



(31) 



U,(X n ) = log* + 21 °^:f° g2 + — ^ (log2 + logr(n) 

- log r (f ) + log r (fc±^) - log r (^) ) 

and, for p = 2, 

IW = l + ^§)+^(*(»>-*(=H) (32) 

Moreover, it holds asymptotically for p ^ 2 

Hz(X n )+m(X n ) = + logp + logg + 

n p — 2 g — 2 



3 See however [33,34,35] for a different definition of generalized multivariate Laplace 
distributions. 
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while for p = 2, 



H(X n ) + H{X n ) 



1 +log7T + o(l) 



(34) 



n 



Proof. (31) can be easily deduced from (18) using the fact that q = 
and Euler's duplication formula [29, 8.335-1]. The limit when p — > 2 can be 
obtained from (19) or using a first order expansion of (31) and formula [29, 
8.365-1]. This last result can also be found starting directly from Shannon 
entropy. Using [30, 6.1.41 and 6.3.18], (33) and (34) follow straightforwardly. 
■ 

These results show that the Student-t distribution with m = n + 2 (or, by con- 
jugation, the multivariate exponential power pdf (30)) achieves asymptotically 
equality in (6) (and (2)). Contrarily to the result of Abe [2] for the Cauchy 
distribution and Shannon entropy, the results obtained here are in analytical 
form. We will discuss in section 3.4 about the possible explanations of this 
asymptotic behavior. 



3.2.2 The Student-t case with 1 degree of freedom: revisiting the Cauchy case 

We deal in this section with the case of a multivariate Cauchy distribution as 
previously studied by Abe in [2], i.e. 



Formulas (18) and (19) apply with m = 1: unfortunately, the remaining inte- 
grals in this case cannot be further simplified and numerical evaluations are 
required. Notice that in the Shannon case, the result found by Abe in [2] is 
recovered, except that one of the two integrals, numerically evaluated in [2], 
is in fact analytically expressed here. 

Curves in Fig. 1 depict the behavior of (19) and (18) as a function of n, in 
the Cauchy case, (the integral is numerically evaluated), for p = 2 (Shannon 
case), p = 3 and p — 10 respectively. The same behavior is observed for any 
value of p tested. The first curve simply confirms the results by Abe [2]. It also 
appears from these curves that the conclusion found by Abe remains true for 
the Renyi version of the uncertainty relation, whatever the possible values of 
p and thus confirm (21). Hence, this behavior is not specific to the Shannon 
entropy. 




(35) 
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3.3 The general Student-r case 



We consider now the case where X n is distributed according to a Student-r 
law with m degree of freedom 4 , i.e. 



r ( — + 1 I ra-n 

f n ( x ) = ¥ > ( 1 - x * x ) 2 



(36) 



where m > n — 2 and (.)+ = max(., 0). When m = n, vector X n is uniformly 
distributed in the n-dimensional sphere x l x = 1. The Student-r variables play 
also an important role in probability, since they appear as n— dimensional 
marginals of the uniform distribution on the sphere in IR m+2 ; they are also 
maximizers (for A = 1 + — of the Renyi/Tsallis entropy under covariance 
constraint [12,21,22,31]. 

Theorem 7 If X n is Student-r distributed, then the pdf of \\X n \\ is 



or ( m. -4- 1^ 

Dn ( r ) = / \ / . ^(l-r 2 )^ (37) 

nK > r (f ) r + i) 1 ; K J 



for r G [0; 1] and otherwise, while the pdf of \\X n \\ is 

2 2 r a +ir(f + i)r 2 (« + i) m _„ 

* ' t — i / <n \ t — i / Til — iyi -i \ 1 * ' * ' 



r(f)r(i 



/or r G [0; +oo) and otherwise. 

Moreover, if q > m +™ +2 ? the sum of the entropy rates is 



U p (X n ) = log7T + 

' 1 



(4 + (m — n) g) log 2 



2n(2 - q) 

^y((p-i) log r (f) + log r + 
io g r( (m -y 2n + i) -pi og r(M + i 



+oo 



n(2-g) 



— (m + ra) g 



e-rlog / r 4 



+71-1 



</ m + ra (r) 



o 



(39) 



4 m is named degree of freedom by misuse of language and may be called shape 
parameter. This choice is adopted by analogy with the Student-t case. 
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for any m > n — 2, while, forp = 2, 



U 2 (X n ) = log(27r) + I (i og r + l) - logr (f + l) 

-logr + l)) + ^ ^ (n) + 2 ^ (sp + l)) 

+ + + i) (40) 

2 ^ + l r ( f+1 ) r2(2 ^n +1 ) _ 

B r( ? )r(^+i) 7 r J^(r) log J^rjdr 



Proof. (37) is directly issued from (36) while (38) is obtained from (12) and 
[28, 9-28 (3)] (or [27, 8.5 (33)] or [29, 6.567-1]). Then, plugging expressions 
(37) and (38) into (8) and using [29, 8.380-1] yields (39) for p ^ 2 and for any 
m > n — 2. 

In the Shannon case, we can again start from (9). Using the same technique 
as in section 3.2 and result [29, 8.380-1 and 6.574-2], we obtain (40). 

From the asymptotics [30, 9.1.7 and 9.2.1] of the Bessel function for small and 
large argument, the integral converges provided that 

4n . , 

m+n+2 v ' 



Again, in the general case, neither (39) nor (40) can be further analytically 
developed and recourse to numerical integration for the remaining integral 
is needed. Note the symmetry between (39)-(40) and (18)-(19) respectively 
that can be explained by remarking the symmetry between the Student-t and 
Student-r variables as evoked in [37]. 



3.3.1 The Student-r case with m = n degrees of freedom: the uniform case 

In this case, we directly apply formulas (39) and (40), where the remaining 
integrals are numerically evaluated. Figures 2 depicts then the behavior of (40) 
and (39) as a function of n, for q = 2.1 (near the Shannon case), q = 3 and 
q — 10 respectively. The same behavior is observed for any value of p tested, 
showing again that the lower bounds of the uncertainty relations are attained 
as n increases. Again, we currently try to end analytically the investigation. 
Note that we observe an increasingly slower convergence as q approaches 2 
(especially for "small" m, e.g. uniform): we choose to present the case q — 2.1 
since the convergence is not too slow compared to other values of q. 
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3.3.2 Some other Student-r cases 



Again we do not enter into details in this subsection. However, other cases were 
studied as illustrated in Fgure 3 for m = n + 2 and m = 2n and parameters 
q = 2, q = 3 and q — 10 respectively, showing that the asymptotic behavior 
holds. Contrary to the Student-t case, in the Student-r case we do not find any 
specific case where a completely analytical study can be performed although 
J fc+ i admits explicit formulation when k e N [29, 8.462]. 



3.4 Discussion 



The preceding results show that any Student-t or Student-r vectors reach 
asymptotically the case of equality in the Bialynicki-Birula & Mycielski in- 
equality or in its Renyi extension. As the only exact (finite dimensional) case 
of equality is met by Gaussian vectors, one may be tempted to explain this 
asymptotic behavior by a "Gaussianization effect" of these vectors, namely the 
fact that they become "more and more Gaussian" in some sense as n increases. 
In the rest of this paragraph, we study two possible measures of Gaussianiza- 
tion: in the distribution sense and the information divergence sense. 



3.4-1 Convergence in distribution 

A well-known property of a Student-t vector X n is that it can be expressed as 
a Gaussian scale mixture [25], namely 5 

X n ±JI^xG n , (42) 



where A m is a scalar inverse Gamma random variable 6 invr (f ,2) with shape 
parameter m/2 and scale parameter 1/2, independent of the zero-mean Gaus- 
sian vector G n with identity covariance matrix (see also [31,37,39]). In the 
particular Cauchy case m — 1, one recovers the fact that A\ is a Levy 
variable [40]. Since E ^/m - 2 \[A~ r , 



r(f ) 



1 + 0(l/m) and 



VAR 



m 



2JA r 



= 1 



m-2 



r (V) 
r (f) 



— + o(l/ m), we deduce that 



2 y r (f ) J 2m 
\fm — 2 v ' A m converges to 1 almost surely when m goes to infinity. Hence, by 
Slutsky's theorem, when m goes to infinity with n, any subvector X* of X n 



5 = means equality in distribution. 

6 The inverse Gamma distribution invT(a,/3) writes f(x) - 
it is the distribution of the inverse of a T(a, 1/(3) random variable. 



-l-a 



exp 
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with finite dimension k < n converges in distribution to a Gaussian vector of 
finite size k [41]. 

However, when m remains constant and n grows, since variable A m does not 
depend on the dimension n, any subvector X% of X n of finite dimension k < n 
remains Student-t with m degree of freedom: as n — > oo, no Gaussianization 
effect happens for constant m, at least in the distribution sense. As an exam- 
ple, this is the case for a Cauchy vector for which it is well-known that any 
subvector X^ remains Cauchy distributed whatever n. 

For the Student-r random vectors, the scale mixture representation does not 
hold. However, it is shown in [37] that any Student-r vector X n can be ex- 
pressed as 

X n ± ^ r (43) 

(\\G n f + B m , n y> 



where B m ^ n is a scalar Gamma distributed variable 7 , with shape parameter 
a = (m — n + 2)/2 and scale parameter f3 — 2, which is independent on 
the unit covariance Gaussian vector G n . This representation was given in [37] 
for m integer, but it holds for non integer values of m as well. Now, variable 
C m = ||GVi|| 2 + -B m , n is Gamma distributed T {^j^-, 2j and then, with the same 

technique as in the Student-t case, \J ri ^ L — > 1 almost surely when n — > oo 
(since m>n — 2, m + 2— »oo when n — > oo). Although C m and G n are not 
independent, one can again evoke Slutsky's theorem [41] to conclude that a 
finite-size subvector of X^ of X n tends in distribution to a Gaussian vector 
when n tends to infinity. This result is known as "Poincare's observation" 
and gave birth to an important literature; despite its name, it is attributed to 
Borel in [42] and to Mehler in [43]. It is illustrated in figure 4 in the uniform 
case for k — 1 and various values of n, and for k = 2 and n = 10. 



3.4-2 Convergence in the Kullback-Leibler divergence rate sense 

The Kullback-Leibler (KL) divergence between a random vector Y n and a 
random vector Z n with respective pdfs p y and p z is defined as 




and is a measure of similarity between these two random vectors. This diver- 
gence is nonnegative and is zero if and only if p z = p y (i.e. Z n — Y n ) [13]. This 



7 The Gamma distribution T(a,(3) writes f(x) = x° 1 exp ^— ^j. 
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divergence has also a physical signification, as shown e.g. in [44,45] Note that 
KL divergence is not symmetric. In the elliptical framework, its expression 
can be simplified using the following lemma. 

Lemma 8 // Y n and Z n are elliptical with the same characteristic matrix 
~~ J-n? then 

D kl (Y n \\Z n ) = D M (\\Y n \\ || ||Z n ||) (45) 



Proof. By definition, 



D kl (Y n || Z n ) = [ p y (x) log dx 

1 \Pz{x)J 



r ( »/2) I^MMr 



with Py(x) = \dy (^(x t x) 2 ^j and using [29, 4.642]. But using the expression of 
P||y n ||(r) as given by (11), we deduce 



D M (Y n \\Z n )= [ p ||yn|| ( r )logf^4r)dr = J D kl (||y;||||||Z n ||) 
J \P\\z n \\{r)J 

■ For S n 7^ I n , Y n and Z n must be replaced by £~5y n and H~^Z n respectively. 

Applying this result to the Student-t vector X n under study and a zero-mean 
Gaussian vector G n with the same covariance matrix yields the following re- 
sults. 

Theorem 9 For m > 2, the KL divergences between a Student-t vector and 
a Gaussian vectors with the same covariance matrix are given by 

D kl (x n || G n ) = f log (^) + io g r (*±*) - i og r (f ) 

(46) 



and 

D kl (G n || x n ) = -f log (^) - io g r (s±2) + io g r (f 



(47) 
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where 



-f-CXJ 

J r n_1 log(l + r 2 )e~ m ^ 



2f- 1 (m-2)-fr(f) 



(48) 



Moreover 8 when n — > oo and m — > oo joit/i n 7 
^■(A. || G.) ~ f log (^) + | log ' m 



n n(n + 2m) 



n + mj 2m 6m 2 (n + m) 



(49) 



whereas, when m = 0(1) we have 

2 



D M (X n || G n ) = | (log ( 



rn 



(50) 



and bounds for function J(n) are given by 



J(») > ' ' ( \ ) - log ) log 



/ 



1 + 



m — 2 



r (f) 



J{n) < log (l + £) + 



2n 



(n + m) (m — 2) 



(51) 
(52) 



Proof. Note first that if m > 2 the covariance matrix of X n with pdf (15) 
is —z^In [37]; if m < 2, this covariance does not exist. Then starting from 
(45), using (16) for X n and (7)-(ll) for G n (with covariance ^z^In), the KL 
divergence between X n and G n writes 



D kl {Xn || G n ) = f log 2 + log r (2±a) - log r (f ) - f log(m - 2) 



m-2 



/ r 2 D„(r) 



q T-i / n-fm N 

( | r _ m+n ' i 



„n-l 



r (f) r (f)i (1 + r 2 ) 2 * 2 



— log(l + r 2 ) dr 



The first integral term is equal to while the second one is evaluated notic- 
ing that //i fc (r)log(/i(r))dr = ^ J h x {r)\ _ and with [29, 8.380-3], leading 
to (46). 



The KL divergence between G n and X n , (47) is derived from the same tech- 
nique. 



/ ~ g means that f/g -> 1 or / = g + o{g) 



20 



Using the asymptotics [30, 6.1.41 and 6.3.18] of the log-gamma and of the psi 
functions up to the second order term, with tedious algebra, (49) follows. 

One can easily check that for any value a and for r > we have 

2 2 

log(l + r 2 ) < log(l + a 2 ) + T —^r 

1 + a z 

Using this inequality with a 2 = n/m in J(n) leads to (52). Likewise, log(l + 
r 2 ) = 2 logr + log(l + r~ 2 ) and for any value a and for r > we have 

log(l + r- 2 ) > log(l + a' 2 ) - 2 (r - a) 

a{l + a z ) 

Hence, plugging this inequality into J(n) and using [29, 4.352-1] leads to 

J(n) > V (f ) - log + log(l + a- 2 ) - ^ ^ - a) 

The best bound is obtained by maximizing the right-hand side, which amounts 
to choose a = leading to (51). ■ 

Results from the above theorem show that .D^i does not generally tend to zero 
with n. However, as previously noted, it is more significant to study the KL 
divergence rates ^Aci- Three situations occur: 

• If n = o(m), (49) shows that D^\{X n || G n ) tends to 0. Thus D^(X n \\ G n )/n 
tends also to when n increases. This behavior shows that both in the 
KL divergence sense and in the KL divergence rate sense, a Gaussianization 
effect happens. Using the asymptotics [30, 6.1.41 and 6.3.18], we obtain that 
D M (G n || X n ) = ^(J(n) - log(l + n/m)) + e(n) where e(n) = o(l). Using 
(52) leads to D^Gn || X n ) < + z{ n )- Together with the positivity of 
the KL divergence, D k \(G n \\ X n ) tends to when n — > oo: this confirms the 
conclusion drawn from _D kl (X n || G n ). 

• If m — > oo in m = 0(n), (49) tells us that the KL divergence D k i(X n \\ G n ) 
has a finite non-zero limit or can even diverge {e.g. if m = o(n)): again 
the rate -D^\{X n \\ G n ) tends to with n — > oo. In this situation, there is 
no Gaussianization in the KL divergence sense, but in divergence rate the 
Gaussianization effect remains. From (51), the same technique as in the first 
case shows that the lower bound of D k y(G n || X n ) tends to a non-zero limit 
(and clearly diverges if m = o(n)). From the upperbound (52), it appears 
that -D k \{G n || X n ) tends to and the conclusion drawn from D k \(X n \\ G n ) 
holds. 

• If m = 0(1) (e.g. m constant), 
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• from (50) one can conclude that no Gaussianization appears, neither in 
KL divergence, nor in KL divergence rate. 

• from the lower bound (51), one can check that D k i(G n || X n ) diverges with 
n. However, from (52) it appears that ^D k i(G n \\ X n ) tends to 0. This 
behavior seems contradictory with the previous one and tells that in fact 
a Gaussianization effect exists in KL divergence rate. This contradiction 
is possible since the KL divergence in not symmetric. Note also that when 
m < 2 (47)-(51)-(52) can still be considered without the normalization 
m — 2 (no covariance for X n ): the conclusion holds in this case. 

The KL divergence rate -D k i(X n \\G n ) is in concordance with the observation 
of the Gaussianization effect in the distribution sense. This generally holds for 
^D k \(G n \\X n ), except notably when m does not go to infinity: in this case, 
although there is no Gaussianization in distribution the KL divergence rate 
goes to with n. Furthermore, when m goes to infinity, although X n reaches 
asymptotically the bound in the B.B.M.I., its distribution stays generally at 
infinite - or at least at non zero distance - "distance" (in the KL divergence 
sense) from the Gaussian distribution. Thus, much care must be taken with 
these conclusions, especially about the real meaning of the KL divergence rate. 

Theorem 10 For any m > n — 2, the KL divergences between a Student-r 
vector and a Gaussian vector with the same covariance matrix is 



D M (x n || G n ) = f log (^) + io g r (f + 1) - io g r (*=» + 1) 

+ m =nty (^ + l)-^(f + l)) 



(53) 



Moreover it verifies the asymptotic 

2 \m-n + 2j (m + 2) (m - n + 2) 



Du(X„ || G„) ~ i log (-^-) - , "i m - n) - (54) 



Proof. Note first that the covariance matrix of X n as defined by (36) is -^p^In 
[37]. Then, (53) is obtained in the same way as (46). 

Notice moreover that here, since m > n, m goes to infinity with n. Using , as 
in the Student-t case, the asymptotic expansions [30, 6.1.41 and 6.3.18] of the 
log-gamma and of the psi functions up to the second order term, (54) follows. 
■ 

Thus, in the Student-r case, two behaviors arise: 

• If n — o(m), from (54) D k y(X n || G n ) tends to zero n, hence the KL di- 
vergence rate goes also to 0: both in the KL divergence and in the KL 
divergence rate sense a Gaussianization effect appears. 
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• If n = 0(m), the KL divergence does not tends to zero and can even diverge 
with n if m — n = o(n) (e.g. in the uniform case). But D^i(X n \\ G n )/n still 
tends to with n, exhibiting again an asymptotic Gaussianization behavior. 

These observations can be linked to a famous result by Diaconis and Freedman 
[42] that quantifies the total variation divergence 9 between X n and G n as 

D tv (X n ,G n )< 2 ( n + 3 ) Vl<n<m-2. (55) 
m — n — 1 

for integer m, where X n is built from the n first components of a (m + 
2)-dimensional vector uniformly distributed in the surface of the (m + 2)- 
dimensional sphere [42]. Moreover, a necessary and sufficient condition for 
convergence to of D tv (X n ,G n ) is n = o(m). These results were extended to 
the KL divergence by O. Johnson [43] as follows: 

Du(X n II G n ) < log ( -^—) + 2 , 

A converse result is also provided in [43], stating that if D k i(X n || G n ) — > 
then n = o{m). 



4 Concluding remarks and discussion 

In this paper, we first extended the entropic uncertainty relation found by 
Bialynicki-Birula & Mycielski to Renyi entropies. We have checked that, for 
a given dimension n, the bound is again attained in the Gaussian case and in 
this case only. We analytically proved that the bound is also asymptotically 
attained with the dimension n in the conjugate multivariate exponential power 
case, whatever p > 1. We numerically showed that the bound is asymptotically 
attained for the Cauchy case for any value of p > 2, extending the results 
of Abe, as well as in the general Student-t case, including these two cases. 
This asymptotic analysis was confirmed analytically. The same conclusion was 
drawn in the Student-r context, as far as our numerical simulations are valid. 
These results seem to violate the fact that the bound is attained only for the 
Gaussian case. If a Gaussianization effect was evoked in the first example, 
the second one showed that the effect of dimension only may be suspected, 
provided some favorable conditions exist. To get a feeling to this interpretation, 

9 We recall that the total variation divergence is the /^-norm difference D tv (Y, Z) = 
f R n \\py ~ pz\\ 
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(56) 



let us come back to the Beckner relation (4), and assume that we deal with a 
wave function ty n such that 



(C M ) n = h(n,p) + o(h(n,p)) (57) 



for large n and for some function h(n,p). Again by taking the logarithm and 
using the same approach as in section 2, it is easy to show that 



Hz(X n ) +Hi(X n ) \ogp logg 2p\og(h(n,p)) 

= log(27r) H H H h o(l) 



n 



p 



n(2 — p) 



As a conclusion, it is sufficient that log M n ' p )) tends to as n increases: h(n,p) 
does not need to converge to 1, showing that no Gaussianization effect is 
needed here; in fact, h(n,p) can even diverge with n. This conclusion holds in 
the Shannon case, provided that (and the remaining term) has a limit 

as p — > 2. In fact, in the conjugate exponential power case (Student-t with 



n+2), function h(n,p) writes h(n,p) = h(p) 



logp log 2 



, illustrating this 



conclusion. As perspective, the Student-r case should be analytically solved 
to confirm the numerical investigations. To go further, it seems interesting 
to determine what are the minimal conditions a pdf f n should verify so that 
it attains asymptotically the bound in the B. B.M.I. As far as we know, this 
question remains open and we still investigate it. We suspect however that in 
the elliptical context, no major additional constraints are needed to reach the 
same conclusion. Indeed, we feel that in this framework the normalization term 
\jn in the entropy rates may be strong: one can invoke Poincare's observation 
inducing a "Gaussianization" and we suspect that the remaining contribution 
of the sum of the entropy can diverge, but at most in o(n). 
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Fig. 1. Illustration of the uncertainty relation (6) in the Cauchy context, for p = 2 
(Shannon version (2)), p = 3 and for p = 10 respectively. The solid line depicts the 
sum of the entropy rates U p (X n ) as a function of n, and the dashed line represents 
the lower bound. 
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n n n 

Fig. 2. Illustration of the uncertainty relation (6) in the uniform case, for q = 2.1 
(near the Shannon version (2)), q = 3 and for q = 10 respectively. The solid line 
depicts U p (X n ) as a function of n and the dashed line depicts the lower bound. 
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Fig. 3. Illustration of the uncertainty relation (6) in several Student-r contexts, for 
q = 2.1 (near the Shannon version (2)), q = 3 and for q = 10 respectively. The 
figures represent the cases m = n + 2 (solid line) and m = 2n (dashed-dotted line). 
The small dashed line represents the lower bound. 
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